AITopics

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Neural Information Processing SystemsFeb-11-2026, 02:21:45 GMT

A Appendix 399 A.1 Message Passing in SyncTREE

It should be noted that we only made a little modification to the GraphTrans model. For NTREE, we set GA T as its basic block with a 0.2 dropout probability between layers.

artificial intelligence, dataset, machine learning, (17 more...)

Technology:

Information Technology > Architecture > Distributed Systems (0.45)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

arXiv.org Artificial IntelligenceOct-17-2025

ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning

Liang, Yichao, Nguyen, Dat, Yang, Cambridge, Li, Tianyang, Tenenbaum, Joshua B., Rasmussen, Carl Edward, Weller, Adrian, Tavares, Zenna, Silver, Tom, Ellis, Kevin

Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic cause-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.

artificial intelligence, bayesian inference, robot, (18 more...)

2509.26255

Genre:

Research Report (0.63)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(2 more...)

Neural Information Processing SystemsOct-11-2025, 00:37:25 GMT

b5e5a6c0ab7078e5c21e7c9e46360480-Paper-Conference.pdf

algorithm, assumption 4, bounded regret, (16 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Neural Information Processing SystemsOct-8-2025, 13:49:23 GMT

435e8fbbfc2c6072d4f3a5cb6e56a39a-Supplemental-Conference.pdf

dataset, rc tree, risc-v dataset, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Liu, Zhongxuan, Kang, Yue, Lee, Thomas C. M.

Lipschitz Bandits with Stochastic Delayed Feedback

arXiv.org Machine LearningOct-2-2025

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay $τ_{\max}$. For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2510.00309

Country: North America > United States > California > Yolo County > Davis (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

arXiv.org Artificial IntelligenceSep-26-2025

Model-Based Reinforcement Learning under Random Observation Delays

Karamzade, Armin, Kim, Kyungmin, Lanier, JB, Corsi, Davide, Fox, Roy

Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a model-based filtering process that sequentially updates the belief state based on an incoming stream of observations. We then introduce a simple delay-aware framework that incorporates this idea into model-based RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach to delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to delay distribution shifts during deployment. Additionally, we present experiments on simulated robotic tasks, comparing our method to common practical heuristics and emphasizing the importance of explicitly modeling observation delays.

machine learning, reinforcement learning, world model, (15 more...)

2509.20869

Country: North America (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Moghimi, Mohammadali, Jose, Sharu Theresa, Moothedath, Shana

Neural Contextual Bandits Under Delayed Feedback Constraints

arXiv.org Artificial IntelligenceApr-17-2025

-- This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback, where the reward for a chosen action is revealed after a random, unknown delay. This scenario is common in applications such as online recommendation systems and clinical trials, where reward feedback is delayed because the outcomes or results of a user's actions (such as recommendations or treatment responses) take time to manifest and be measured. The proposed algorithm, called Delayed Neu-ralUCB, uses upper confidence bound (UCB)-based exploration strategy. We further consider a variant of the algorithm, called Delayed NeuralTS, that uses Thompson Sampling based exploration. Numerical experiments on real-world datasets, such as MNIST and Mushroom, along with comparisons to benchmark approaches, demonstrate that the proposed algorithms effectively manage varying delays and are well-suited for complex real-world scenarios. The stochastic contextual bandit (CB) problem has gained immense interest in recent years due to its application in various domains, including healthcare, finance, and recom-mender systems [1]-[5]. The CB is a sequential decision-making problem where, in each round, the agent (or decision-maker) is presented with K actions and associated contextual information.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2504.12086

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

arXiv.org Artificial IntelligenceFeb-21-2025

Orthogonal Calibration for Asynchronous Federated Learning

Zhang, Jiayun, Li, Shuheng, Huang, Haiyu, Yu, Xiaofan, Gupta, Rajesh K., Shang, Jingbo

Asynchronous federated learning mitigates the inefficiency of conventional synchronous aggregation by integrating updates as they arrive and adjusting their influence based on staleness. Due to asynchrony and data heterogeneity, learning objectives at the global and local levels are inherently inconsistent -- global optimization trajectories may conflict with ongoing local updates. Existing asynchronous methods simply distribute the latest global weights to clients, which can overwrite local progress and cause model drift. In this paper, we propose OrthoFL, an orthogonal calibration framework that decouples global and local learning progress and adjusts global shifts to minimize interference before merging them into local models. In OrthoFL, clients and the server maintain separate model weights. Upon receiving an update, the server aggregates it into the global weights via a moving average. For client weights, the server computes the global weight shift accumulated during the client's delay and removes the components aligned with the direction of the received update. The resulting parameters lie in a subspace orthogonal to the client update and preserve the maximal information from the global progress. The calibrated global shift is then merged into the client weights for further training. Extensive experiments show that OrthoFL improves accuracy by 9.6% and achieves a 12$\times$ speedup compared to synchronous methods. Moreover, it consistently outperforms state-of-the-art asynchronous baselines under various delay patterns and heterogeneity scenarios.

client update, latency, learning, (13 more...)

2502.1594

Country:

North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > Virginia (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report (0.64)

Industry: Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceDec-23-2024

Age Optimal Sampling for Unreliable Channels under Unknown Channel Statistics

He, Hongyi, Tang, Haoyue, Pan, Jiayu, Wang, Jintao, Song, Jian, Tassiulas, Leandros

In this paper, we study a system in which a sensor forwards status updates to a receiver through an error-prone channel, while the receiver sends the transmission results back to the sensor via a reliable channel. Both channels are subject to random delays. To evaluate the timeliness of the status information at the receiver, we use the Age of Information (AoI) metric. The objective is to design a sampling policy that minimizes the expected time-average AoI, even when the channel statistics (e.g., delay distributions) are unknown. We first review the threshold structure of the optimal offline policy under known channel statistics and then reformulate the design of the online algorithm as a stochastic approximation problem. We propose a Robbins-Monro algorithm to solve this problem and demonstrate that the optimal threshold can be approximated almost surely. Moreover, we prove that the cumulative AoI regret of the online algorithm increases with rate $\mathcal{O}(\ln K)$, where $K$ is the number of successful transmissions. In addition, our algorithm is shown to be minimax order optimal, in the sense that for any online learning algorithm, the cumulative AoI regret up to the $K$-th successful transmissions grows with the rate at least $\Omega(\ln K)$ in the worst case delay distribution. Finally, we improve the stability of the proposed online learning algorithm through a momentum-based stochastic gradient descent algorithm. Simulation results validate the performance of our proposed algorithm.

algorithm, inequality, online algorithm, (15 more...)

2412.18119

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry: Education > Educational Setting (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)